A novel prediction approach using wavelet transform and grey multivariate convolution model

It is crucial to develop highly accurate forecasting techniques for electricity consumption in order to monitor and anticipate its evolution. In this work, a novel version of the discrete grey multivariate convolution model (ODGMC(1,N)) is proposed. A linear corrective term is included in the conventional GMC(1,N) structure, parameter estimation is carried out in a manner consistent with the modelling process, and an iterative technique is used to get the cumulated forecasting function of ODGMC(1,N). As a result, the forecasting capacity of ODGMC(1,N) is more reliable and its stability is enhanced. For validation purposes, ODGM(1,N) is applied to forecast Cameroon's annual electricity demand. The results show that the novel model scores 1.74% MAPE and 132.16 RMSE and is more precise than competing models.• ODGMC(1,N) corrects the linear impact of t on the forecasting performance.• Wavelet transform is used to remove irrelevant information from input data.• The proposed model can be used to track annual electricity demand.


Introduction
It is essential to project future electricity needs since doing so lays the groundwork for improved utility-wide decisions [1] . The more accurate the forecasts, the better the decisions, which results in a utility that is more efficient and dependable and can better meet society's need for electric power. As a result, accurate forecasts are a tool for the grid's safe and dependable operation as well as for promoting cost-effective operation by optimizing production scheduling.
The legal forecasting requirement established by the Electricity Sector Regulatory Agency is one of the factors driving forecasting studies by electricity distribution system operators. According to the laws and network rules, information transfer to established contractual partners in the Central African and Cameroonian markets is required. Additionally, the regulations and related laws call for the payment of fairly substantial penalties against distributors in situations when they are the source of unbalanced electricity off-take from the grid. In these situations, distributors are compelled to provide forecasts, particularly during the dry season, when residential end-user usage might increase significantly because of the excessive heat conditions. The requirement for determining future investments in energy markets (including renewable energy sources), for enhancing energy efficiency, and for prioritizing energy investment projects also drives the development of electricity forecasting systems.
With unexpectedly high price spikes and daily seasonality, power price trends have peculiar characteristics. The need for accurate price estimates emerges as one of the fundamental challenges that the various market participants face [2] , and as a result, research on electricity markets has taken centre stage in the electricity industry. Making money is the primary goal of these activities. It is crucial to keep in mind that the stability of the grid is the primary goal of the electricity markets [3] . However, the more stability of the system is threatened, the likelihood of a blackout increases as price volatility increases. Because of this, by accurately forecasting electricity consumption, which requires knowledge of future pricing, the grid's stability is could be improved.
The network operating margins must be in line with consumer demand and temporal variances for all of the aforementioned reasons. To some extent, operations must be adaptable in order to account for changes in demand. Predictive models developed for certain consumer categories, such as residential end users, are used to adapt power demand fluctuations with system operational restrictions. From the distributor's perspective, precise demand forecasting will lower operational expenses and remove penalties that can result from imbalances in supply and demand quantities.

=2
(1) ( ) and are the deriving term and deriving coefficients respectively, while − is referred to as the system development coefficient.
The differential equation (DE) model family includes GMC(1,N) as shown in Definition 1 . The right-hand side (RHS) of Eq. (5) is generally not a constant, in contrast to the conventional GM(1,1). As a result, the GMC(1,N) model's time response function 1 or solution to Eq. (5) is substantially more complicated than that of the GM(1,1) model. Ref. [ 4 ] first approximates Eq. (5) as a difference equation ( Eq. (6b) ) by considering the RHS of Eq. (5) to be a function ( ) as in Eq. (6a) : Eq. (6b) is obtained by integrating Eq. (5) on either sides in the range [ − 1 , ] and then applying the trapezoidal integration rule to the remaining unknown terms. In light of the parameters = [ , 2 , . . ., , ] , the difference equation ( Eq. (6b) ) is a set of linear equations. Specifically: where: ( ) ≠ 0 (which implies that the matrix can be inverted), then the parameters of matrix = [ , 2 , . . ., , ] can be estimated by least-squares method: Going further, Tien [ 4 ] solved the DE given in Eq. (5) using the initial condition (when = 1 ) ̂ (1) 1 (1) = (0) 1 (1) , and obtained: Eq. (10) is known as the cumulated forecast function of GMC(1,N). Ref. [ 4 ] provides a thorough explanation of how the solution was derived. However, it is still challenging to obtain an explicit expression because the convolution integral exists on the RHS of Eq. (10) . Fortunately, we can get a rough solution by using some numerical integrals. The trapezoid formula is a straightforward and widely used technique that yields a more precise time response function shown below 2 : The function ℎ ( ) is the step size given by: Finally, the forecasts ̂ (0) 1 ( ) are determined by applying inverse 1-AGO: N) is an evolution of both the conventional GM(1,1) and GM(1,N) models. Also, we make the following key remarks: Remark. 1. The second term in the RHS of Eq. (5) disappears if there is only one series ( = 1 , and is known as a univariate model) and the model is reduced to: 1 Time response function is sometimes called the cumulated forecast function [ 5 ]. 2 Other high precision numerical integrals, such as the Gaussian formula, could also be used to calculate the convolution integral in Eq. (10) [ 14 ].
This DE is the GM(1,1) model's traditional image equation. From Eq. (10) , we obtain the following by setting ( ) equal to a fixed value : which is the GM(1,1) model's time response function. The least-squares method of Eq. (7) can still be used to solve the parameters and . Except that is reduced to: ( 1 ) Eq. (13) is the GM(1,N) model's standard image equation. The RHS of Eq. (13) is viewed as a constant in the pioneers' works (see Ref. [ 5 ]). As a result, the cumulated forecast function of GM(1,N) may be derived exactly as that of the conventional GM(1,1): ] . The least-squares technique can also be used to calculate the values of these parameters. also shows a difference, which is: Based on Remarks 1 and 2 , it is obvious that GMC(1,N) is theoretically better than the classical GM(1,N) in three following aspects:  [ 4 ], and therefore relies on various system factors. There are still some issues with structural rigidity in GMC(1,N). Though, its structure has undergone many modifications, most works have failed to address the linear impact of on GMC(1,N)'s behaviour, which could explain why the model's predictive accuracy is poor. (iii) Presence of noise in data: Finally, most studies have not taken into account the possibility that the input data may contain irrelevant information, which could also reduce the model's ability to predict outcomes accurately.
In view of the previous analysis, it follows that GMC(1,N) remains flawed due to inadequate parameter estimation and its structure is too simple to cope with real-world systems. These obvious defects urges us to develop a GMC(1,N) that overcomes them.

The proposed optimal model
We briefly discussed the GMC(1,N) model in Section 2.1 and demonstrated that it is in fact an upgrade of the conventional GM(1,N). It suffers from several glaring flaws, though. In this section, a novel optimal discrete multivariate GMC model (abbreviated ODGMC (1,N)) is developed as a solution to these flaws, enhancing the precision and stability of the GMC(1,N) model while also addressing their causes.

ODGMC(1,N) model
We start by introducing a new definition that is based on the concept examined by Ref. [ 6 ].
To improve the structure of GMC(1,N) in comparison, an additional linear adjustment term ( − 1 ) 1 has been added in the RHS of Eq. (5) . Many works on grey multivariate models (for example: Zeng et al. [ 6,7 ]) have come to the conclusion that this linear adjustment factor is crucial and can easily change the degree to which (0) 1 and (0) 1 , ( = 2 , 3 , . . ., ) are related. However, in these earlier studies, the DE is not used; instead, the linear correction factor is included right into the difference equation. Moreover, if we apply the same methodology used to derive the cumulated forecast function of GMC(1,N), we get an expression similar to Eq. (10) , with the exception that ( ) becomes: As a result, the parameter estimation mismatch flaw mentioned in Section 2.1 continues to exist. Fortunately, this mismatch issue can be eliminated by employing the discrete grey model methodology.

Evaluation of the predicted accumulated and original series
The following theorem describes the application of the recursive technique (similar to that of discrete grey models [ 8 , 9 ]) in order to determine the ODGMC(1,N) model's cumulated forecast function: Theorem: The cumulated forecast function of ODGMC(1,N), as specified in Eq. (19) , is as follows under the initial condition ̂ (1) 1 ( = 1 ) = (0) 1 (1) : The appendix shows the proof. This Theorem demonstrates that Eq. (22) can be used to evaluate the 1-AGO series ̂ (1) 1 . As a result, the inverse-AGO can be used to determine the forecasted series ̂ (0) 1 as follows: Data filtration based on wavelet transform ODGMC(1,N) may be corrupted by noise or useless information in the raw data. Wavelet transform (WT) can be used to clean this noise [ 10 ]. A wavelet is a mathematical function that separates various scale components from a continuous-time signal. Basically, WT is a band-pass filter with its bandwidth scaled to half at each level [ 11 ]. The scaling function makes sure that the entire spectrum is considered by filtering away the transform's lowest level. Eq. (23) applies when a signal ( ) is continuous: where the scale and translation parameters, are denoted by and ( , ∈ ℝ ) respectively. Eq. (23) is called the continuous wavelet transform (CWT). With a discrete signal , a discrete WT (DWT) is calculated using Eq. (24) : where = 1 , 2 , … and are the sampling time and scale factor respectively. is the number of samples. The most crucial element of the signal is its low order component. The signal's identity is clarified in this component. The signal's high order component, on the other hand, is a representation of the signal's specifics.

Data selection
Data used in this method paper cover the period 2000-2019 and were collected from IEA and World Bank's development indicators (WDI). Specifically, dataset on electricity consumption is from IEA ( https://www.iea.org/ ). In contrast, datasets on GDP per capita, household expenses and population size come from WDI ( https://databank.worldbank.org ) and are confirmed by the National Institute of Statistics ( https://ins-cameroun.cm/ ).

Performance evaluation
The reliability and forecast precision are evaluated using Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Percentage Error (MAPE) and Absolute Percentage Error (APE).
MAPE and APE disclose the models' predictive accuracy. MAPE in particular is a performance metric that compares the accuracy of forecasts based on relative errors to prevent positive and negative errors from mutually annulling. Threshold values of MAPE are   27) ) is very often considered as a loss function. It is calculated by adding the square of the difference between real (0) 1 ( ) and forecasted electricity consumption ̂ (0) 1 ( ) , over all the data points and dividing the result by number of data points.
RMSE ( Eq. (28) ) is the square root of MSE. RMSE acts much like the MSE except that it is prone to inflate significant deviations [ 13 ], and this may be useful when comparing competing models.
The best performing model in indicated by a score of RMSE, MSE, MAPE and APE that is closest to zero. However, we focus more on MAPE because it is a metric that comes up very often in forecasting studies. MAPE is usually expressed as a percentage error making it easy to grasp and compare the accuracy of a model across data sets and case studies.   data were hidden to prevent any leakage in order to verify whether the models were overfitting or underfitting. Simulation outcomes are displayed on Table 2 , whereas Fig. 1 provides a visual representation of these results. There is evidence of the failure of the conventional GM(1,1) to accurately capture the system's evolution law.
GM(1,N) predictions (yellow curve) fit moderately well in the modelling phase ( Table 2 ) but completely deviates in the validation phase, meaning it is overfitting. Of all alternative models, only the GMC(1,N) manages to compete with the new ODGMC(1,N) model. It can be seen from Fig. 1 that the new ODGMC(1,N) manages to correctly track the system's evolution in both the modelling and validation phases.
APE distribution (see Fig. 2 ) for the training and validation phases shows that the residuals of ODGMC(1,N) model are much more smaller than those of GM(1,1), GM(1,N) and GMC (1,N). So, this also reiterates the superiority of ODGMC(1,N) model. MAPE criteria in particular, with statistics of 2.74% and 1.99% in the modelling and validation phase respectively, demonstrates that ODGMC (1,N) is a class I model and can compete with high accuracy models.

Conclusion and future works
A new and innovative approach is developed in this paper with the aim to improve the predictive accuracy of GMC(1,N). More specifically, an optimal version abbreviated ODGMC(1,N) is formulated and implemented in this work. This forecasting model is based on the concepts of discrete GMs and on wavelet data filtering techniques. The novel ODGMC(1,N) corrects two flaws in the modelling structure of GMC(1,N) and has the ability to excavate the links that exist between a series and its drivers. The structure of the novel model and its parameterisation are also carefully considered. The prediction of Cameroon's annual electricity demand is used to demonstrate that ODGMC(1,N) has a significantly higher forecast precision and improved stability than competing models. Moreover, ODGMC(1,N) succeeds in achieving this with a limited number of explanatory variables. The forecasting outcomes confirm that ODGMC(1,N) succeeds in predicting the demand for electricity in Cameroon with MAPE, RMSE, and MSE of 1.74%, 132.16, and 17,465.21, respectively, categorizing it as a very high precision model. These interesting results are due to: • The stability of ODGMC(1,N) resulting from a good adequacy between parameters estimation and their implementation.
• The addition of a term that takes into account the linear impact of on the model's performance.
• The removal of irrelevant information from input data by wavelet transform filtration.
One limit of the proposed model is that it cannot fully extract information from seasonal series that exhibits sharp fluctuations. This is because the WT will perceive these variations as noise when they are not, and once WT has damped down what it takes to be a disturbance, the system's information will be lost resulting in poor forecasts.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments
We are grateful to Departments of Logistics and Transport Engineering and the Department of Energetics and Thermal Engineering, University Institute of Technology, for providing the necessary facilities and support.
Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.